Counterfactual Data-Fusion for Online Reinforcement Learners

نویسندگان

  • Andrew Forney
  • Judea Pearl
  • Elias Bareinboim
چکیده

The Multi-Armed Bandit problem with Unobserved Confounders (MABUC) considers decision-making settings where unmeasured variables can influence both the agent’s decisions and received rewards (Bareinboim et al., 2015). Recent findings showed that unobserved confounders (UCs) pose a unique challenge to algorithms based on standard randomization (i.e., experimental data); if UCs are naively averaged out, these algorithms behave sub-optimally, possibly incurring infinite regret. In this paper, we show how counterfactual-based decision-making circumvents these problems and leads to a coherent fusion of observational and experimental data. We then demonstrate this new strategy in an enhanced Thompson Sampling bandit player, and support our findings’ efficacy with extensive simulations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comprehension of factual, nonfactual, and counterfactual conditionals by Iranian EFL learners

A considerable amount of studies have been established on conditional reasoning supporting mental model theory of propositional reasoning. Mental model theory proposed by Johnson- larid and Byrne is an explanation of someone's thought process about how something occurs in the real world. Conditional reasoning as a kind of reasoning is the way to speak about possibilities or probabilities. The a...

متن کامل

Valence biases factual and counterfactual learning in opposite directions

Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the ...

متن کامل

Reinforcement learning and counterfactual reasoning explain adaptive behavior in a changing environment

Animals routinely adapt to changes in the environment in order to survive. Though reinforcement learning may play a role in such adaptation, it is not clear that it is the only mechanism involved, as it is not well suited to producing rapid, relatively immediate changes in strategies in response to environmental changes. This research proposes that counterfactual reasoning might be an additiona...

متن کامل

Counterfactual thinking and anticipated emotions enhance performance in computer skills training

The present study examined the relationship between novice learners' counterfactual thinking (i.e. generating what if and if only thoughts) about their initial training experience with a computer application and subsequent improvement in task performance. The role of anticipated emotions towards goal attainment in task performance was also assessed. Undergraduate students (N = 42) with minimal ...

متن کامل

Counterfactual reasoning as a key for explaining adaptive behavior in a changing environment

It is crucial for animals to detect changes in their surrounding environment, and reinforcement learning is one of the well-known processes to explain the change detection behavior. However , reinforcement learning itself cannot fully explain rapid, relatively immediate changes in strategy in response to abrupt environment changes. A previous model employed reinforcement learning and counterfac...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017